Search CORE

152 research outputs found

Automatic vs Manual Provenance Abstractions: Mind the Gap

Author: Alper Pinar
Belhajjame Khalid
Goble Carole A.
Publication venue
Publication date: 21/05/2016
Field of study

In recent years the need to simplify or to hide sensitive information in provenance has given way to research on provenance abstraction. In the context of scientific workflows, existing research provides techniques to semi automatically create abstractions of a given workflow description, which is in turn used as filters over the workflow's provenance traces. An alternative approach that is commonly adopted by scientists is to build workflows with abstractions embedded into the workflow's design, such as using sub-workflows. This paper reports on the comparison of manual versus semi-automated approaches in a context where result abstractions are used to filter report-worthy results of computational scientific analyses. Specifically; we take a real-world workflow containing user-created design abstractions and compare these with abstractions created by ZOOM UserViews and Workflow Summaries systems. Our comparison shows that semi-automatic and manual approaches largely overlap from a process perspective, meanwhile, there is a dramatic mismatch in terms of data artefacts retained in an abstracted account of derivation. We discuss reasons and suggest future research directions.Comment: Preprint accepted to the 2016 workshop on the Theory and Applications of Provenance, TAPP 201

arXiv.org e-Print Archive

The University of Manchester - Institutional Repository

Micropublications: a Semantic Model for Claims, Evidence, Arguments and Annotations in Biomedical Communications

Author: Ciccarese Paolo N.
Clark Tim
Goble Carole A.
Publication venue
Publication date: 02/02/2014
Field of study

The Micropublications semantic model for scientific claims, evidence, argumentation and annotation in biomedical publications, is a metadata model of scientific argumentation, designed to support several key requirements for exchange and value-addition of semantic metadata across the biomedical publications ecosystem. Micropublications allow formalizing the argument structure of scientific publications so that (a) their internal structure is semantically clear and computable; (b) citation networks can be easily constructed across large corpora; (c) statements can be formalized in multiple useful abstraction models; (d) statements in one work may cite statements in another, individually; (e) support, similarity and challenge of assertions can be modelled across corpora; (f) scientific assertions, particularly in review articles, may be transitively closed to supporting evidence and methods. The model supports natural language statements; data; methods and materials specifications; discussion and commentary; as well as challenge and disagreement. A detailed analysis of nine use cases is provided, along with an implementation in OWL 2 and SWRL, with several example instantiations in RDF.Comment: Version 4. Minor revision

arXiv.org e-Print Archive

Springer - Publisher Connector

Harvard University - DASH

PubMed Central

The University of Manchester - Institutional Repository

The Software Sustainability Institute:Changing Research Software Attitudes and Practices

Author: Carr Les
Chue Hong Neil
Crouch Stephen
De Roure David
Goble Carole A
Hettrick Simon
Jackson Michael
Parsons Mark
Pawlik Aleksandra
Sufi Shoaib
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

To effect change, the Software Sustainability Institute works with researchers, developers, funders, and infrastructure providers to identify and address key issues with research software

CiteSeerX

Southampton (e-Prints Soton)

Crossref

Edinburgh Research Explorer

myExperiment: a repository and social network for the sharing of bioinformatics workflows

Author: Aleksejevs Sergejs
Bechhofer Sean
Bhagat Jiten
Borkum Mark
Cruickshank Don
De Roure David
Goble Carole A.
Li Peter
Michaelides Danius
Newman David
Roos Marco
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

myExperiment (http://www.myexperiment.org) is an online research environment that supports the social sharing of bioinformatics workflows. These workflows are procedures consisting of a series of computational tasks using web services, which may be performed on data from its retrieval, integration and analysis, to the visualization of the results. As a public repository of workflows, myExperiment allows anybody to discover those that are relevant to their research, which can then be reused and repurposed to their specific requirements. Conversely, developers can submit their workflows to myExperiment and enable them to be shared in a secure manner. Since its release in 2007, myExperiment currently has over 3500 registered users and contains more than 1000 workflows. The social aspect to the sharing of these workflows is facilitated by registered users forming virtual communities bound together by a common interest or research project. Contributors of workflows can build their reputation within these communities by receiving feedback and credit from individuals who reuse their work. Further documentation about myExperiment including its REST web service is available from http://wiki.myexperiment.org. Feedback and requests for support can be sent to [email protected]

PubMed Central

Oxford University Research Archive

The University of Manchester - Institutional Repository

Methods Included:Standardizing Computational Reuse and Portability with the Common Workflow Language

Author: Abeln Sanne
Amstutz Peter
Chilton John
Crusoe Michael R.
Gavrilović Bogdan
Goble Carole A.
Iosup Alexandru
Ménager Hervé
Soiland-Reyes Stian
The CWL Community
Tijanić Nebojša
Publication venue
Publication date: 04/08/2021
Field of study

A widely used standard for portable multilingual data analysis pipelines would enable considerable benefits to scholarly publication reuse, research/industry collaboration, regulatory cost control, and to the environment. Published research that used multiple computer languages for their analysis pipelines would include a complete and reusable description of that analysis that is runnable on a diverse set of computing environments. Researchers would be able to easier collaborate and reuse these pipelines, adding or exchanging components regardless of programming language used; collaborations with and within the industry would be easier; approval of new medical interventions that rely on such pipelines would be faster. Time will be saved and environmental impact would also be reduced, as these descriptions contain enough information for advanced optimization without user intervention. Workflows are widely used in data analysis pipelines, enabling innovation and decision-making for the modern society. In many domains the analysis components are numerous and written in multiple different computer languages by third parties. However, lacking a standard for reusable and portable multilingual workflows, then reusing published multilingual workflows, collaborating on open problems, and optimizing their execution would be severely hampered. Moreover, only a standard for multilingual data analysis pipelines that was widely used would enable considerable benefits to research-industry collaboration, regulatory cost control, and to preserving the environment. Prior to the start of the CWL project, there was no standard for describing multilingual analysis pipelines in a portable and reusable manner. Even today / currently, although there exist hundreds of single-vendor and other single-source systems that run workflows, none is a general, community-driven, and consensus-built standard

arXiv.org e-Print Archive

VU Research Portal

CWI's Institutional Repository

The University of Manchester - Institutional Repository

HAL-Pasteur

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Performing statistical analyses on quantitative data in Taverna workflows: an example using R and maxdBrowse to identify differentially-expressed genes from microarray data.

Author: Castrillo Juan I
Goble Carole A
Kell Douglas B
Li Peter
Oinn Tom
Oliver Stephen G
Owen Stuart
Pocock Matthew R
Soiland-Reyes Stian
Velarde Giles
Wassink Ingo
Withers David
Publication venue: BMC Bioinformatics
Publication date: 01/01/2008
Field of study

BACKGROUND: There has been a dramatic increase in the amount of quantitative data derived from the measurement of changes at different levels of biological complexity during the post-genomic era. However, there are a number of issues associated with the use of computational tools employed for the analysis of such data. For example, computational tools such as R and MATLAB require prior knowledge of their programming languages in order to implement statistical analyses on data. Combining two or more tools in an analysis may also be problematic since data may have to be manually copied and pasted between separate user interfaces for each tool. Furthermore, this transfer of data may require a reconciliation step in order for there to be interoperability between computational tools. RESULTS: Developments in the Taverna workflow system have enabled pipelines to be constructed and enacted for generic and ad hoc analyses of quantitative data. Here, we present an example of such a workflow involving the statistical identification of differentially-expressed genes from microarray data followed by the annotation of their relationships to cellular processes. This workflow makes use of customised maxdBrowse web services, a system that allows Taverna to query and retrieve gene expression data from the maxdLoad2 microarray database. These data are then analysed by R to identify differentially-expressed genes using the Taverna RShell processor which has been developed for invoking this tool when it has been deployed as a service using the RServe library. In addition, the workflow uses Beanshell scripts to reconcile mismatches of data between services as well as to implement a form of user interaction for selecting subsets of microarray data for analysis as part of the workflow execution. A new plugin system in the Taverna software architecture is demonstrated by the use of renderers for displaying PDF files and CSV formatted data within the Taverna workbench. CONCLUSION: Taverna can be used by data analysis experts as a generic tool for composing ad hoc analyses of quantitative data by combining the use of scripts written in the R programming language with tools exposed as services in workflows. When these workflows are shared with colleagues and the wider scientific community, they provide an approach for other scientists wanting to use tools such as R without having to learn the corresponding programming language to analyse their own data.RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are

Directory of Open Access Journals

PubMed Central

The University of Manchester - Institutional Repository

Apollo (Cambridge)

University of Twente Research Information

Newcastle University E-Prints

FigShare

Workflow-centric research objects: First class citizens in scholarly discourse.

Author: Bechhofer Sean
Belhajjame Khalid
Corcho Oscar
De Roure David
García Cuesta Esteban
Garijo Daniel
Goble Carole A.
Gómez-Pérez José Manuel
Klyne Graham
Missier Paolo
Newman David
Page Kevin
Palma Raúl
Roos Marco
Ruiz José Enrique
Soiland-Reyes Stian
Verdes-Montenegro Lourdes
Zhao Jun
Publication venue: Facultad de Informática (UPM)
Publication date: 01/01/2012
Field of study

A workflow-centric research object bundles a workflow, the provenance of the results obtained by its enactment, other digital objects that are relevant for the experiment (papers, datasets, etc.), and annotations that semantically describe all these objects. In this paper, we propose a model to specify workflow-centric research objects, and show how the model can be grounded using semantic technologies and existing vocabularies, in particular the Object Reuse and Exchange (ORE) model and the Annotation Ontology (AO).We describe the life-cycle of a research object, which resembles the life-cycle of a scienti?c experiment

CiteSeerX

University of Birmingham Research Portal

The University of Manchester - Institutional Repository

Archivo Digital UPM

Structuring research methods and data with the research object model:genomics workflows as a case study

Author: 't Hoen Peter A. C.
Bechhofer Sean
Belhajjame Khalid
Corcho Oscar
Cruickshank Don
de Roure David
Dharuri Harish
Garrido Julian
Goble Carole
Hettne Kristina M.
Klyne Graham
Mina Eleni
Roos Marco
Soiland-Reyes Stian
Thompson Mark
van Schouwen Reinout
Verdes-Montenegro Lourdes
Wolstencroft Katherine
Zhao Jun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Background: One of the main challenges for biomedical research lies in the computer-assisted integrative study of large and increasingly complex combinations of data in order to understand molecular mechanisms. The preservation of the materials and methods of such computational experiments with clear annotations is essential for understanding an experiment, and this is increasingly recognized in the bioinformatics community. Our assumption is that offering means of digital, structured aggregation and annotation of the objects of an experiment will provide necessary meta-data for a scientist to understand and recreate the results of an experiment. To support this we explored a model for the semantic description of a workflow-centric Research Object (RO), where an RO is defined as a resource that aggregates other resources, e. g., datasets, software, spreadsheets, text, etc. We applied this model to a case study where we analysed human metabolite variation by workflows. Results: We present the application of the workflow-centric RO model for our bioinformatics case study. Three workflows were produced following recently defined Best Practices for workflow design. By modelling the experiment as an RO, we were able to automatically query the experiment and answer questions such as "which particular data was input to a particular workflow to test a particular hypothesis?", and "which particular conclusions were drawn from a particular workflow?". Conclusions: Applying a workflow-centric RO model to aggregate and annotate the resources used in a bioinformatics experiment, allowed us to retrieve the conclusions of the experiment in the context of the driving hypothesis, the executed workflows and their input data. The RO model is an extendable reference model that can be used by other systems as well. Availability: The Research Object is available at http://www.myexperiment.org/packs/428 The Wf4Ever Research Object Model is available at http://wf4ever.github.io/r

arXiv.org e-Print Archive

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

PubMed Central

Oxford University Research Archive

Leiden University Scholary Publications

The University of Manchester - Institutional Repository

Lancaster E-Prints

Archivo Digital UPM

The First Provenance Challenge

The first Provenance Challenge was set up in order to provide a forum for the community to help understand the capabilities of different provenance systems and the expressiveness of their provenance representations. To this end, a Functional Magnetic Resonance Imaging workflow was defined, which participants had to either simulate or run in order to produce some provenance representation, from which a set of identified queries had to be implemented and executed. Sixteen teams responded to the challenge, and submitted their inputs. In this paper, we present the challenge workflow and queries, and summarise the participants contributions

Southampton (e-Prints Soton)

XGAP: a uniform and extensible data model and software platform for genotype and phenotype experiments.

Author: Alberts Rudi
Arends Danny
Coordination of Mouse Informatics Resources (CASIMIR)
de Brock Engbert O
Dijkstra Martijn
Genotype-To-Phenotype (GEN2PHEN) Consortiums
Goble Carole
Hancock John M
Jansen Ritsert C
Jones Andrew R
Parkinson Helen E
Scheltema Richard A
Schofield Paul
Schughart Klaus
Smedley Damian
Swertz Morris A
Tesson Bruno M
Velde K Joeri van der
Vera Gonzalo
Wolstencroft Katy
Publication venue: Genome Biol
Publication date: 01/01/2010
Field of study

We present an extensible software model for the genotype and phenotype community, XGAP. Readers can download a standard XGAP (http://www.xgap.org) or auto-generate a custom version using MOLGENIS with programming interfaces to R-software and web-services or user interfaces for biologists. XGAP has simple load formats for any type of genotype, epigenotype, transcript, protein, metabolite or other phenotype data. Current functionality includes tools ranging from eQTL analysis in mouse to genome-wide association studies in humans.RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are

Helmholtz Zentrum für Infektionsforschung Repository

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

PubMed Central

The University of Manchester - Institutional Repository

Apollo (Cambridge)

University of Groningen Digital Archive

Dissertations of the University of Groningen